MATRIX: MAny-Task computing execution fabRIc at eXascale
نویسندگان
چکیده
Efficiently scheduling large number of jobs over large-scale distributed systems is critical in achieving high system utilization and throughput. Today’s state-of-the-art job management systems have predominantly Master/Slaves architectures, which have inherent limitations, such as scalability issues at extreme scales (e.g. petascales and beyond) and single point of failure. In designing the nextgeneration distributed job management system, we must address new challenges such as load balancing. This paper presents the design, analysis and implementation of a distributed execution fabric called MATRIX (MAny-Task computing execution fabRIc at eXascale). MATRIX utilizes an adaptive work stealing algorithm for distributed load balancing, and distributed hash tables for managing task metadata. MATRIX supports both high-performance computing (HPC) and many-task computing (MTC) workloads, as well as task dependencies in the execution of complex large-scale workflows. We have evaluated it using synthetic workloads up to 4K-cores on an IBM Blue Gene/P supercomputer, and have shown high efficiency rates (e.g. 85%+) are possible with certain workloads with task granularities as low as 64ms. MATRIX has shown throughput rates as high as 13K tasks/sec at 4K-core scales (one to two orders of magnitude higher than existing centralized systems). We also explore the feasibility of adaptive work stealing up to 1M-node scale through simulations.
منابع مشابه
SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Exascale computing have challenges, most of which can be potentially addressed by Many-task computing paradigm through efficient task execution frameworks that are several orders of magnitude beyond current batch schedulers. This paper proposes a light-weight discrete event simulator, SimMatrix, which simulates distributed job scheduler comprising of millions of nodes and billions of cores/task...
متن کاملPaving the Road to Exascale with Many-Task Computing
Exascale systems will bring significant challenges. This work attempts to addresses them through the Many-Task Computing (MTC) paradigm, by delivering data-aware job scheduling systems and fully asynchronous distributed architectures. MTC applications are structured as DAG graphs of tasks, with dependencies forming the edges. The asynchronous nature of MTC makes it more resilient than tradition...
متن کاملSimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascales
Exascale computers will enable the unraveling of significant scientific mysteries. Predictions are that by 2019, supercomputers will reach exascales with millions of nodes and billions of threads of execution. Many-task computing (MTC) is a new viable distributed paradigm for extreme-scale supercomputing. The MTC paradigm can address four of the five major challenges of exascale computing, name...
متن کاملTask Scheduling Algorithm Using Covariance Matrix Adaptation Evolution Strategy (CMA-ES) in Cloud Computing
The cloud computing is considered as a computational model which provides the uses requests with resources upon any demand and needs.The need for planning the scheduling of the user's jobs has emerged as an important challenge in the field of cloud computing. It is mainly due to several reasons, including ever-increasing advancements of information technology and an increase of applications and...
متن کاملAn Effective Task Scheduling Framework for Cloud Computing using NSGA-II
Cloud computing is a model for convenient on-demand user’s access to changeable and configurable computing resources such as networks, servers, storage, applications, and services with minimal management of resources and service provider interaction. Task scheduling is regarded as a fundamental issue in cloud computing which aims at distributing the load on the different resources of a distribu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013